The present invention relates generally to the field of information processing, and more particularly to information processing within a database system.
With an increasing development of information technology, information discovery is becoming more important. Information can be searched from a plurality of documents by using a keyword search. A full text index may be needed for the plurality of documents to facilitate the keyword search. Some documents, such as PDF files, Office files, and/or compressed files, contain unstructured data. Unstructured data may be information that may not by organized according to a predefined model (e.g., codepage) but may contain dates and times. Structured data may be information structured in a way that can be manipulated and processed according to predefined models that may rely on patterns. A codepage may be a table of values that describes the characters of a document. Codepages may be used to structure data within a document.